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ABSTRACT 

The  Gibbs  sampler  has  been  proposed  as  a  general  method  for  Bayesian  calculation 
in  Gelfand  and  Smith  (1990).  However  experience  to  date  is  almost  exclusively  in 
applications  assuming  conjugacy  where  implementation  is  reasonably  straightforward. 
This  paper  describes  a  tailored  rejection  method  approach  for  implementation  of  the  Gibbs 
sampler  when  nonconjugate  structure  is  present.  Several  challenging  applications  are 
presented  for  illustration. 
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1.  Introdnction 

In  earber  work  (Gelfand  and  Smith,  1990;  Gelfand  et.  al.,  1989)  a  sampbng  based 
approach  using  the  Gibbs  sampler  (Geman  and  Geman,  1984)  was  offered  as  a  means  for 
implementing  Bayesian  data  analysis.  This  approach  is  very  broadly  appbcable  but 
experience  to  date  resides  almost  exclusively  in  appbcations  assuming  conjugacy.  Two 
notable  exceptions  are  Zeger  and  Karim  (1989)  and  Racine— Poon  et.  al.  (1990).  By  way  of 
clarification,  in  the  context  of  a  hierarchical  Bayes  model,  conjugacy  is  taken  to  mean  that 
for  any  parameter  in  the  model  specification  (bkebhood  *  prior),  integration  of  this  model 
with  respect  to  this  parameter  may  be  achieved  expbcitly.  This  pragmatic  definition 
differs  a  bit  from  formal  versions  in  e.g.  Diaconis  and  Ylvisaker  (1979)  or  in  Morris  (1983). 

Conjugacy  allows  us  to  simpbfy  the  implementation  of  the  Gibbs  sampler  enabbng 
almost  routine  fuUy  Bayesian  analysis  of  many  standard  problems.  However,  more 
chaUenging  modebng  situations  will  not  aUow  conjugagy  as  the  following  examples  suggest: 

(i)  reference  priors  (Bernardo,  1979;  Berger  and  Bernardo,  1989)  and  other  versions  of 
"noninformative"  priors  (Berger,  1985)  wiU  not  be  conjugate  with  the  bkelihood. 

(ii)  nonbnear  models,  including  generalized  linear  models,  will  have  likdihoods  as 
functions  of  the  model  parameters  which  typically  do  not  admit  a  conjugate  form. 

(iii)  for  hierarchical  models,  according  to  Berger  (1985,  p.  232)  "the  choice  of  a  form  for 
the  second  or  higher  stage  prior  seems  to  have  relatively  little  effect".  However, 
this  is  usually  not  the  case  at  the  first  stage  of  prior  specification  where  the  form  of 
the  prior,  e.g.,  body  and  tails  will  have  substar'.iat  euec«,  on  the  inference.  To  assess 
model  robustness  requires  fully  Bayesian  analysis  when  the  first  stage  prior  is 
nonconjugate. 

To  carry  out  the  Gibbs  sampler  in  the  presence  of  nonconjugacy  for  at  least  some  of 
the  model  parameters  requires  sampling  from  nonstandardized  densities  as  discussed  below. 
A  means  for  accompbshing  such  sampling  is  the  rejection  method  (Devroye,  1986;  Ripley, 
1986).  The  rejection  method  is  formally  defined  in  Section  3.  The  purpose  of  this  paper  is 
to  describe  a  tailored  general  rejection  method  approach  for  implementation  of  the  Gibbs 
sampler  when  some  nonconjugate  structure  is  present. 

To  clarify  how  nonstandardized  densities  arise  we  note  that  the  Gibbs  sampler 
requires  independent  draws  from  the  complete  conditional  distributions  of  the  model 
parameters  (see  Sec.  2).  For  any  parameter  in  any  hierarchical  model,  its  complete 
conditional  distribution  is  the  conditional  distribution  of  the  parameter  given  the  data  and 
all  other  model  parameters.  But,  it  is  then  clear  that  for  each  model  parameter  its 
complete  conditional  density  is  proportional  to  likelihood  '  prior.  Often  the  hierarchical 
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structure  along  with,  for  example,  exchangeability  assumptions  greatly  simplify  these 
distributions. 

In  the  next  section  we  briefly  review  the  Gibbs  sampler.  Since  it  is  a  replicated 
iterative  Markovian  updating  scheme,  the  conditional  levels  for  each  complete  conditional 
distribution  which  needs  to  be  sampled  vrill  change  with  each  iteration  and  replication. 
Standard  use  of  the  rejection  method  requires  that  a  distinct  envelope  function  be 
developed  for  each  such  sampling.  Unfortunately  this  envelope  is  used  to  generate  but  one 
observation.  As  an  alternative,  in  Section  3  we  first  note  that  a  good  multidimensional 
envelope  density  will  provide  good  complete  conditional  envelope  densities.  We  then  show 
how  to  create  such  a  multidimensional  envelope  density  which  also  possesses  complete 
conditional  distributions  that  are  easy  to  sample. 

In  Section  4  we  illustrate  with  three  demanding  modeling  applications.  FinaUy  in 
Section  5  we  summarize  noting,  in  addition,  when  our  proposed  method  is  likely  to  work 
well  and  when  not. 

2.  The  Gibbs  Samnlpr 

For  convenience,  in  this  section  densities  will  be  denoted  generically  by  square 
brackets,  so  that  joint,  conditional  and  marginal  forms  for  random  variables  U,V,  appear  as 
[U,V],  [U|V]  and  [V],  respectively.  The  usual  marginalization  by  integration  is  denoted 
by  forms  such  as  [U]  =  J[U|V]  •  [V].  For  a  collection  of  random  variables  [Uj,U2,.--,Uj^] 

the  complete  conditional  dfusmes  can  thus  be  denoted  by  [U  |U  ,  r  ^  s],  s  =  l,2,...,k,  and 

the  marginal  densities  by  [U^],  s  =  1,2,. ..,k. 

Given  the  ability  to  draw  random  variate  samples  of  from  lUgjUj.,  r#s]  for 

specified  values  of  {U^.,  r^s},  s  =  1,2,.. .,k,  the  Gibbs  sampler  provides  an  iterative  scheme 

which  enables  us  to  make  sample-based  estimates,  [U  ],  of  the  marginal  densities,  [U,],  s 

=  1,2,.. .,k.  The  scheme  proceeds  as  follows:  given  an  arbitrary  starting  set  of  values 
we  draw  from  [U^|U^^V..,u[°)],  then  from  [U2|Up\ 

and  so  on  up  to  from  to  complete  one  iteration 

of  the  scheme.  After  t  such  iterations  we  would  arrive  at  joint  a  sample  (UpV  -.U^^b- 

Geman  and  Geman  (1984)  show,  under  mild  conditions,  that  (u|^^,...,U^^b  ^  (Uj,...,Uj^) 

-  [UpU2,-  -,^"]5]  as  t  -  tE.  Hence  for  t  large  enough,  for  example,  can  be  regarded  as 
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a  sample  variate  from  [U  ].  Parallel  replication  of  this  process  m  times  yields  m  iid 

k-tuples  j  =  1.2,-  ,m.  Note  that  sample  size  at,  say,  the  w— th  iteration 

J  J  /  ^  \ 

may  be  increased  from  m  to  any  specified  size  by  randomly  reusing  the  ^sj 

replacement. 

It  is  shown  in  Gelfand  and  Smith  (1990)  that  a  density  estimate  of  the  form 


(1) 


is  better  than  a  kernel  density  estimate  for  [U  1.  This  is  not  surprising  since  (1)  takes 

advantage  of  the  known  structure  in  the  model  whereas  the  kernel  density  estimate  does 
not.  The  form  (1)  is  a  discrete  mixture  distribution  and  is  essentially  a  Monte  Carlo 
integration  to  accomplish  the  desired  marginalization.  Extension  to  expectations, 
E[h(Ug)],  and  more  generally  to  densities  and  expectations  for  functions  W(Uj,  U2,...,Uj^) 

is  straightforward  (see  Gelfand  and  Smith,  1989  for  details). 

In  the  Bayesian  context  the  are  the  unknown  parameters  (or  possibly 

unobserved  data)  in  the  model,and  W  would  be  any  function  of  the  parameters  (or 
unobserved  data)  which  is  of  interest.  All  distributions  are  viewed  as  conditional  on  the 
observed  data,  whence  the  marginal  densities,  [U  ],  become  the  desired  marginal  posterior 

d-sinbiiii'-ns  of  the  parameters  (or  unobserved  data).  Moreover,  the  joint  der.sit, 
[Up...,Uj^]  becomes  the  joint  density  of  all  the  model  parameters/unknowns  given  the 

observed  data.  This  density,  only  known  modulo  normalizing  constant,  will  be  denoted  by 
f(Up...,Uj^)  where  f  is,  in  fact,  likelihood  '  prior.  Similarly  all  complete  conditional 

distributions  will  again  be  proportional  to  f  and,  in  the  absence  of  conjugacy,  will  not  lead 
to  familiar  standard  forms  such  as  normals  and  gammas.  As  a  result,  though  the  forms  of 
the  densities  required  for  use  of  the  Gibbs  sampler  are  known,  sampling  will  require 
random  generation  from  nonstandardized  densities.  In  Section  3  we  suggest  an  approach  to 
accomplish  this  using  a  tailored  version  of  the  rejection  method. 

As  noted  above  we  prefer  to  use  a  density  estimate  of  the  form  (1).  In  fact  using 
this  form  allows  m  to  be  much  smaller  (say  m  =  100)  than  needed  for  kernel  density 
estimates  (say  m  =  5000).  However,  calculation  of  (1)  will  require,  at  the  last  iteration,  m 
normalizations  of  f  which  in  turn  requires  m  one-dimensional  numerical  integrations. 
Simple  trapezoidal  or  Simpson’s  rule  integration  to  do  this  is  quite  fast  still  yielding 
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substantial  overall  savings  in  run  time  compared  with  kernel  density  estimation. 

Finally,  we  note  that  complete  implementation  of  the  Gibbs  sampler  requires  that  a 
determination  of  t  be  made  and  that,  across  iterations,  choice(s)  of  m  be  specified.  In  a 
challenging  application  some  experimentation  with  different  settings  for  t  and  m  will 
likely  be  necessary.  We  do  not  view  this  as  a  deterrent  since  random  generation  is 
generally  inexpensive  and  since  in  many  cases  there  may  be  no  feasible  alternative.  In  the 
examples  of  Section  4,  convergence  was  evaluated  in  a  univariate  marmer  by  plotting 
marginal  posterior  density  estimates  of  the  form  (1)  five  iterations  apart  in  order  to  judge 
stability.  Typically,  a  somewhat  small  m  is  used  until  convergence  is  concluded,  at  which 
point,  for  a  final  iteration,  m  is  increased  by  an  order  of  magnitude  to  develop  the 
presented  density  estiamte.  We  make  no  claims  for  the  optimality  of  this  procedure. 
Assessment  of  convergence  is  a  complex  issue  which  is  currently  very  much  in  the  empirical 
domain. 

3.  A  Tailored  Rejection  Method 

In  this  section  we  develop  a  specialized  version  of  the  rejection  method  which  is 
well-suited  to  the  sampling  needs  of  the  Gibbs  sampler.  First  we  review  the  basic  rejection 
algorithm  method. 

3.1  The  Rejection  Algorithm 

The  rejection  algorithm  for  a  nonstandardized  micgxdble  density  f(U), 
U  =  (Up...,Uj^)  proceeds  as  follows. 

(i)  Identify  a  density  g(U)  which  may  be  readily  sampled  and  such  that  there  exists 
M  for  which  f(U)/g(U)  <  M  for  all  U. 

(ii)  Generate  U*  from  g(U). 

(iii)  Generate  V  from  a  U(0,1)  distribution. 

(iv)  Accept  U*  if  V  <  f(U*)/Mg(U*),  otherwise  return  to  (ii). 

It  may  be  shown  (see  Devroye,  1986  or  Ripley,  1986)  that  the  distribution  of  U*  is 
f(U)/|f(U)  and  also  that  the  acceptance  probability  associated  with  this  algorithm  is 
M  ^[f(U).  Hence  the  smaller  we  can  make  M,  that  is,  the  more  g  resembles  f,  the  more 


efficient  the  sampling. 


3.2  Split-normal  and  Solit-t  Envelope  Fnnctions 


Implementation  of  the  Gibbs  sampler  requires  sampling  from  f  viewed  as  a 
function  of,  say,  for  fixed  U_j  =  (U2,..-,Ujj)-  However  the  value  of  U_j  changes  with 


each  iteration  and  each  replication, 
that  a  distinct  envelope  function 


Customary  use  of  the  rejection  method  then  requires 
g^  (Uj)  be  developed  for  each  U_j.  Moreover, 


typically  each  such 


6u_j(Ui) 


is  used  to  generate  but  one  observation. 


As  an  alternative,  now  viewing  f  as  a  k— dimensional  function  we  propose,  before 
doing  any  sampling,  to  create  a  single  k— dimensional  density  function  g(U)  which  is  a 

good  envelope  for  f  and  is  such  that  for  each  g  has  complete  conditional  distributions 

which  are  easy  to  sample.  Formalizing  notation  and  still  taking  i  =  1  we  write 

g(Uj,...,Uj^)  =  gi(U2|U_j)  •  g2(y_i)-  Note  that  gi(U2|U_j)  serves  as  an  envelope  for 

the  complete  conditional  distribution  for  TJj  arising  from  f.  That  is,  if  M  is  such  that 

f(U)/g(U)  <  M  for  ail  U  then,  as  a  function  of  Uj  for  fixed  U_p  f/g^  <  M'  = 

Mg2(U_i).  In  practice  g^  g2  and  M'  are  not  calculated;  acceptance  of  is 

* 

determined  by  the  equivalent  test  (iv)  above  evaluating  f  and  g  at  (Up  U_j). 

How  might  a  suitable  g(U;  uk.  developed?  Writing  f(U)  =  likebhood(U)  • 
prior(U),  if  U  is  the  maximum  likelihood  estimate  of  U  we  may  take  g(U)  =  prior(U) 
with  M  =  likelihood(U)  to  implement  the  rejection  method.  This  choice  of  g  has  at 


least  two  drawbacks  .  First,  since  it  only  matches  the  prior,  it  need  not  be  a  good  envelope 
for  f  so  that  very  inefficient  sampling  may  result.  Second,  it  requires  prior(U)  to  be 

proper  (since  we  sample  from  g  in  the  rejection  method)  where  as,  in  fact,  we  only  need  f 
proper.  Hence,  while  this  choice  of  g  may  be  viewed  as  a  possible  backup  we  seek  a 
proper  g  which  more  closely  resembles  f. 

In  the  context  of  noniterative  Monte  Carlo  sampling  with  respect  to  a 
nonstandardized  density,  Geweke  (1989)  proposes  the  use  of  an  importance  sampling 
density  which  is  a  multivariate  split-normal  or  split— t  distribution.  Such  a  density,  g,  is 
designed  to  make  the  variability  of  the  ratio  f/g  over  the  space  of  U  small  under  g 


which  in  turn  makes  the  variance  of  the  Monte  Carlo  integration  small.  Note  that  such  a 
g  is  desirable  for  our  purposes  since  the  less  variable  f/g  is,  the  smaller  M  will  be  whence 
the  greater  the  acceptance  probability  and  the  more  efficient  our  sampling. 

Recall  that  in  the  Bayesian  framework,  modulo  normalization,  f  is  viewed  as  the 
joint  posterior  density  function  of  all  the  parameters  (and  perhaps  any  missing  data)  given 
the  observed  data.  With  an  increasing  amount  of  data,  under  usual  regularity  conditions  f 
is  approximately  a  multivariate  normal  density  up  to  a  proportionality  constant  (see  e.g. 
Berger  1985,  p.  224).  A  convenient  choice  to  approximate  the  mean  of  this  normal 
distribution  is  U,  the  mode  of  f.  With  regard  to  an  approximate  covariance  matrix,  the 


preferred  choice  from  an  asymptotic  viewpoint  is  the  negative  of  the  inverse  Hessian 

evaluated  at  U.  The  Hessian  matrix  H  is  defined  by  H-.  =  In  two  stage 

J  J 

models  we  might  use  the  log  likelihood  rather  than  log  f  which  amounts  to  replacing  H 
by  the  information  matrix  I. 

Often  both  the  H  matrix  and  the  I  matrix  are  difficult  to  obtain  since  they 
require  the  existence  of  and  the  evaluation  of  second  derivatives.  A  differencing  algorithm 
(such  as  in  Kass,  1987)  can  be  used  to  provide  reliable  derivative— free  estimates  for  the 
matrices  H  or  I  thus  avoiding  formal  differentiation.  Since  our  objective  is  only  to 
approximate  the  covariance  matrix,  S,  associated  with  f  we  need  not  use  these 
asymptotic  forms  but  may  instead  adopt  alternative  choices  for  S.  One  simple  approach 
which  avoids  the  differentiation  problem  is  to  approximate  the  surface  log  f(U)  by  a 

quadratic  .uncc'jn,  e.g.  (U  —  U)  V(U  — U)  whence  S  =  — V  .  This  approximation  can 


be  straightforwardly  developed  by  usual  least  squares  methods  fitting  the  quadratic  to  a 
large  set  of  log  f  values  obtained  by  evaluating  f  at  many  points  on  a  k— dimensional  grid. 
It  may  in  fact  prove  easiest  to  first  transform  U  to  the  k-dimensional  unit  square,  obtain 


the  covariance  matrix  estimate  and  then  transform  this  estimate  back  to  the  original  scale 
by  the  delta  method.  When  there  are  strong  correlations  amongst  the  U’s  or  when  log  f 
is  fairly  flat  H,  I  and  V  may  be  nearly  singular  making  inversion  awkward.  This  problem 
can  be  alleviated  appropriate  reparametrization,  i.e.,  transformation  of  U. 

We  note  another  approach  which  avoids  both  differentiation  and  inversion  problems 
but  at  the  expense  of  computational  effort  that  will  become  infeasible  with  increasing 
dimensionality.  We  can  obtain  a  piecewise  uniform  approximation  to  f  and  then  obtain 
the  covariance  matrix  associated  with  this  approximation.  For  simplicity  of  illustration 
assume  f  is  bivariate.  Again,  it  may  be  easiest  to  transform  fUpU2)  to  the  unit  square 


o 

with  resulting  density  proportional  to  h.  Partition  the  unit  square  into  a  grid  of  r 
and  evaluate  h  at  say  the  midpoint  of  each  cell  in  the  grid.  Denoting  these  values  by 


cells 


i,  j  =  l,....r,  replace  the  density  h  by  the  constant  h-j  for  points  in  the  (i,j)^^  cell  to 

obtain  the  piecewise  uniform  approximation  to  h.  Normalization  of  this  approximate 
density  and  calculation  of  its  moments  is  straightforward.  Thus  we  may  approximate  the 
covariance  matrix  associated  with  h  and,  again  using  the  delta  method,  that  associated 
with  f. 


The  above  discussion  suggests  taking  g(U)  to  be  N(U,  E)  for  some  convenient  E. 


However  for  more  interesting  situations  involving  small  to  moderate  amounts  of  data, 
although  f  will  typically  still  be  unimodal,  it  will  likely  be  somewhat  asymmetric  and  our 
choice  of  E  will  likely  be  a  weak  covariance  approximation.  Geweke  (1989)  suggests  that 
an  appropriate  split-normal  or  split-t  distribution  be  used  in  place  of  N(U,  E).  We  now 

develop  the  details  of  this  approximation  for  our  situation  including  the  required  complete 
conditional  distributions. 

A  standard  univariate  split-normal  distribution  denoted  by  SN(0,q,r)  is  defined  by 
the  density 


rj_  e""^/2q2 

J_ 

,'2¥r 


z  >  0 
z  >  0 


To  generate  Z-SN(0,q,r)  we  draw  f-r7(0,l)  and  take  Z  =  qt  if  f>0,  Z=r€  if  £<0.  Let 
t}  =  (Zj,...,Zj^)  be  a  random  vector  such  that  Zj  are  independent  with  Zj-SN(0,qj,r.). 

A  general  multivariate  split  normal  arises  by  affine  transformation  of  Z.  In  particular,  it 

is  proposed  to  take  g  to  be  the  distribution  of  U  =  U  +  E^^^Z  for  q.,  given  below. 

Choices  for  q^  and  r  are  intended  to  make  g  a  better  envelope  than  N(U,  E). 

Geweke  proposes  that  q.  =  supv.(A),  r-=  supv.(A)  where 

'  A>0  '  ’  A<0  ' 


j2(log  f(U)  -  log  f(U+nE‘/2Al'))) 


(2) 
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and  is  a  unit  vector  m  the  i''^  coordinate  direction.  Geweke  notes  that  the  a’s 

yielding  these  maxdm’  :orrespond  to  the  positive  and  negative  values  respectively  along  the 
i^^  coordinate  axis  which  maximize  the  ratio  of  the  rate  of  decline  of  f, 
f(U+AE*/^A^*^)/f(U),  to  the  rate  of  decline  of  g,  g(U+AE'/^A(^Vg{U).  Choice  of  q^,  r  in 

this  i-^armer  gives  f/g  the  same  value  at  a  such  that  qj=  sup  (V.(a))  (r.=  sup  (V.(a)))  as 

A>0  ^  A<0  ' 

at  A  =  0*^(0”).  Such  matching  aids  in  making  f/g  "more  constant"  in  each  coordinate 
direction.  Exact  calculation  of  Qj,  r^  is  an  analytical  problem  generally  without  explicit 

solution.  Practically,  these  values  are  obtained  only  approximately  by  evaluating  Vj(a) 

over  the  set  {a  =  j/2,  j  =  ±  1,  ±  2,.. .,±12}. 

The  tail  behavior  associated  with  the  form  of  f  might  suggest  that  a  better  choice 
for  g  would  be  a  multivariate  split— t  distribution.  A  standard  univariate  split— t  with 
i/d.f.,  ST(i/;  0,q,r)  arises  as  the  distribution  of  t  =  Zf\VJu  where  Z-SN{0,q,r) 
independent  of  V,  a  x  random  variable  with  u  d.F.  To  generate  t-ST(:/;  0,q,r)  we 
draw  ^-t^  and  take  t  =  q(f  if  ^  >  o,  t  =  r^  if  ^  <  o.  More  generally  let  t  =  (tp...,tj^) 

be  a  random  vector  where  tj  =  2.J\VJu  with  independent,  ZpSN(0,q-,r.) 

independent  of  A  general  multivariate  split— t  arises  by  affine  transformation  of  t. 

In  particular  it  is  proposed  to  take  g  to  be  the  distribution  of  U  =  U  +  E^^^t  with  qj  and 

r.  calculated,  replacing  Vj(A)  in  (2)  with 

v.(a)  =  -■  - .  -  -  -  (3) 

ji/((f(U)/f(U+AE‘/2A^‘^))^/^‘''^‘'^-l) 


Remarks  below  (2)  are  applicable  here. 

We  comment  that  it  seems  preferable  to  transform  (reparametrize)  each  Uj  to 

have  as  support  before  embarking  on  the  creation  of  g  to  "match"  f. 

Returning  to  the  multivariate  split-normal  it  is  perhaps  easiest  to  think  of  the 

1, 

transformation  from  Z  to  U  as  arising  from  2  one-to-one  transformations  determined 

by  the  vector  sgnZ  =  (sgn  Zp  sgn  ZT,...,sgn  Zj^).  Index  these  transformations  by  j  = 
k  k 

1.2, ....2  with  assoaated  partitions  of  R  denoted  by  .A-  On  A.  there  will  be  an 

•)  J 

2  2 

associated  set  of  q’s  and  r’s.  In  fact  let  dj-  =  q-  if  sgn  Z.  =  1  on  Aj,  r^  if  sgn  Zj  =  -1 


-ic¬ 


on  Aj  and  let  Dj  be  a  diagonal  matrix  with  diagonal  entries  dj^.  Then  on  Aj, 
Z-N(0,Dj)  and  thus  the  density  for  Z  is,  in  obvious  notation 


h(Z)  =  E  N(0,D.)(Z)  •  (Z 
i=i  j 


(Z) 


(4) 


If  B-  is  the  image  of  A.  under  the  transformation  U  =  U  +  E^/^Z  then  the  density  for 
J  3 

U  is 


2^ 

g(u)  =  r  N(U,  sVjd  (!;V!)‘)(U) .  i  (U)  (5) 

j=l  j 

The  Gibbs  sampler  requires  sampling  from  the  complete  conditional  distributions 
associated  with  f.  By  earlier  remarks,  this  requires  sampling  from  the  complete 
conditional  distributions  associated  with  g.  But,  for  example,  what  is  g(U^|U_^)  for  the 

density  (5)?  We  now  show  that  this  distribution  is  a  univariate  split  normal  which  can  be 
easily  sampled.  Choose  for  E^^^  the  upper  triangular  (Cholesky)  decomposition  of  E 
which  we  denote  by 


T  = 


Note  that,  by  using  T,  U_^  uniquely  determines  Z_j  =  T22(U_^  -  U_j).  Furthermore 
Uj  =  Wj(U_,)  +  TjjZj  whce  W,(U_j)  =  Uj  +  TjjT22(U_j -U_j).  Hence  U,|U_j 
■  Uj|Z_j  -  (Wj(U_,)  +  T;jZ,)|Z_,  -  VVj(U_|)  +  T^jZj  i.e.  Uj|U_j  has  a 
univariate  split  normal  distribution.  Moreover  Uj|U_j  is  easily  sampled  by  drawing 
-  SN(0,qj,rj)  and  making  the  linear  transformation  Wj(U_j)  +  T^^Z^.  We  remark  that 
T  and  U  do  not  change  from  iteration  to  iteration,  from  replication  to  replication.  Given 
we  need  only  calculate  W(U_^)  which  just  involves  linear  operations  on  U_j. 

In  the  multivariate  split-t  case  t  =  (tp...,tj^)  arises  from  t.  =  Z-J\YJv  with  Z. 


.  2 


independent,  Zj  -  SN(0,qprp  independent  of  V  -  Then,  analogous  to  (4),  with 


obvious  notation 


for  U  =  U  + 


h(t)=  E  t^(0,D.)(t)  ■  l^(t) 
j=l  •’  j 


2 

g(U)  =  E  t  (U,  i‘^^D.(i'^=fy(V)  ■  Ig  (U) 
j=  1  *'  j 


Again  we  require  associated  complete  conditional  distributions.  Careful  evaluation  shows 
that  Uj|U_^  now  has  a  univariate  split— t  distribution.  More  precisely  Uj|U_j  - 

h  '  ST(i/+k-l,  0,  qp  rj)  and  Vj(U_j)  = 

—1  ^2  1/ 

((iz+k-l)  (i^  +  E  1-/e  ))  with  the  Z.  being  components  of  Z  ,  defined  above  and 

2  1  1  1 

2  2 

e.  =  qj  or  r^  according  to  whether  Z.  >  0  or  <0.  Given  we  need  to  calculate 
V,(U_j)  in  addition  to  Wj(U__p. 

We  commented  earlier,  that  in  implementing  the  rejection  method  w'^v’d  test  a 

* 

generated  from  Uj|U_j  by  using  (iv)  in  Section  3.1.  Computation  '•  -  '-^plified  by 
^  -1  -1 

noting  that  g(U)  =  fl  T..  •  h(T  (U  —  U))  with  h  as  in  (4)  or  in  (6)  accordingly. 

i=l  “ 

However  it  still  remains  to  choose  M.  It  seems  natural  to  look  at  the  ratio  f/g  at  the 
mode  U  but,  as  yet,  g  is  undefined  at  U  i.e.  h  is  undefined  at  0.  Let  h{0)  = 

k  _i  _i  — k/2 

a  n  min(q.  ,  r-  )  where  in  the  split— normal  case  c  =  (27r)  '  while  in  the  split— t  case 

i=l  ' 

0=  r(^)/(r(i//2)(H^^^)-  Then  g(U)  =  •  h(0).  Define  M(U)  =  f(U)/g{U). 

For  both  the  split-normal  and  split-i  cases  as  a  result  of  the  way  h  was  chosen  along 
with  its  definition  at  0  M(U)  will  bound  f/g  in  a  neighborhood  of  U.  In  practice, 

choosing  M  =  bM(U)  with  1.2<b<5  has  provided  an  overall  bound  for  f/g.  In  our 

experience  choice  of  d.f.(^)  to  accommodate  the  tail  behavior  of  f  is  more  critical. 
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If,  during  the  course  of  sampling,  a  arises  such  that  f(UQ)/g(U^)  violates  our 

bound  we  do  a  local  search  in  a  neighborhood  of  and  revise  M  accordingly.  Before  M 

is  revised  the  magnitude  of  f(yQ)/Mg(U^)  provides  a  rough  idea  of  the  severity  of  the 

violation.  Of  course  if  a  violation  occurs  then  some  of  the  previously  generated  variates 
might  not  have  been  retained  with  this  revised  M  and  more  importantly  we  were  not 
sampling  from  the  desired  complete  conditional  distribution.  Before  exploring  this  point 
further  suppose  that  the  change  in  M  is  small  (as  is  typically  the  case  in  our  experience) 
so  that  most  of  the  previously  generated  variates  would  still  be  retained.  Then  we  would 
expect  the  joint  distribution  of  U  at  the  current  iteration  of  the  sampler  to  be  closer  to 

the  converged  joint  distribution  than  when  we  started.  Thus  we  would  expect  no 
advantage  to  starting  the  sampler  anew  as  opposed  to  continuing  from  the  current 
iteration. 

Continuing  with  these  ideas,  suppose  for  a  given  M  we  define  ‘ 

f(U)/g(U)  >  M}.  Following  the  argument  which  justifies  the  rejection  method  we  may 
show  that  the  distribution  of  U  is  actually 


(3) 


Unfortunately 


I  f(U)d(U)  >  MP  (5j^j) 


so  that  even  if  Pg(5j^j) 


is  very  small  we  cannot 


be  sure  that  (8)  is  close  to  f(U)//f(U)dU.  Hence  complete  conditional  distributions 

arising  from  (8)  need  not  be  close  to  complete  conditional  distributions  arising  from  f(U). 

More  optimistically,  if  for  example  given  U_j  the  set  of  such  that  f(U)/g(U)  >  M 

is  a  null  set  then  we  are,  in  fact,  sampling  from  the  complete  conditional  distribution  of 
Uj|U_j  arising  from  f. 

We  conclude  this  section  with  an  important  remark.  When  k  is  large. 
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development  of  g(U)  will  be  made  difficult  because  of  complications  in  obtaining  U,  E 

and  T.  However,  in  most  applications  f(U)  is  a  product  of  functions.  Hence,  if  we  need 

to  sample  from  f  viewed  as  a  nonstandaidized  density  for  we  need  only  consider 

the  terms  in  this  product  involving  and  only  the  variables  say  U 2,. appearing 

in  these  terms.  That  is,  we  factor  f(U)  as  f(Uj,...,Uj^)  =  fj(Uj,  •  ^2(^—1) 

that  g  need  only  be  a  k'  dimensional  envelope  function.  Typically,  k'  is  much  smaller 
than  k  as,  for  instance,  in  exchangeable  models. 

4  Rxamnles 

In  this  section  we  apply  the  tailored  rejection  method  to  three  nonconjugate 
modeling  scenarios.  Each  has  been  chosen  to  illustrate  one  or  more  of  points  (i)— (iii)  of 
Section  1. 

4.1  Asymptotic  Regression  Model 

Consider  a  model  having  mean  structure 

X. 

E(Yj)=Q  — /37  \  Q,i5>0,  0  <  7  <  1  (9) 

This  equation  describes  a  growth  curve  which  has  no  inflection  point  and  approaches  an 
asymptote  as  X-  tends  to  infinity.  Models  of  this  type  find  agricultural,  biological,  and 

engineering  application.  To  complete  the  specification  of  the  model,  we  assume 
independent  Yj  -  N(E(Y.),  a  ),  i=l,...,n,  and  adopt  the  vague  prior  Tr{a,0,  r,a)  «  {aa) 

considered  by  Hills  (1989).  While  the  prior  we  have  adopted  is  not  a  reference  prior  in  the 
sense  of  Bernardo  (1979),  it  is  a  vague  prior  in  the  spirit  of  (i)  in  Section  1.  In  any  event, 
the  nonlinear  structure  in  (9)  precludes  conjugate  priors  as  noted  in  (ii)  in  Section  1. 

In  order  to  implement  the  method  of  Section  3,  we  observe  that  f  =  likelihood  “ 
prior  takes  the  form 

f(Q,/?,7,(7)  =  E  (y.-a+/?'^  ^)^}, 

2a  i=l  ’ 

so  that  none  of  the  four  required  complete  conditional  distributions  are  available  in  closed 
form.  Hence  four  split-t  envelopes  will  be  needed. 
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For  numerical  illustration  we  use  a  data  set  from  Ratkowsky  (1983),  displayed 
in  Table  1,  which  tallies  length  (Y)  and  age  (X)  for  27  captured  samples  of  the  sirenian 
species  dugong  (more  commonly  known  as  the  sea  cow).  To  implement  our  method  we 
first  transform  each  of  the  variables  to  by  letting  =  logo,  U2  =  log/3,  =  logit(7) 

=  log(7/(l-7)),  and  =  loga.  We  then  approximate  the  covariance  matrix,  S,  of  U, 

using  the  quadratic  regression  approach  mentioned  above.  We  obtain  four  Cholesky 
matrices  T  from  S  by  permuting  the  elements  of  S  appropriately  to  make  each  of  the 
Uj  in  turn  the  first  element  of  U.  For  this  example  we  chose  split— t  distributions  having 

1/  =  5  d.f. 

[INSERT  TABLE  1  ABOUT  HERE] 

Figures  1(a)— (c)  show  the  marginal  posterior  density  estimates  for  U^,  U2  and 

that  result  from  the  use  of  (1)  on  m=500  Gibbs  iterates  after  completing  t=50  iterations 
of  the  algorithm.  We  remark  that  density  estimates  on  the  original  scales  could  be 
obtained  via  routine  transformation  as  mentioned  above  (Gelfand  and  Smith,  1989).  The 
posterior  modes,  .974,  -.014  and  1.884  are  comparable  to  the  least  squares  estimates, 
.981,  -.028  and  1.932  obtained  by  Ratkowsky  (1983,  p.  96). 

4.2  Hierarchical  Event  Rate  Model 


To  model  arrivals  or  events  occurring  over  known  lengths  of  time  we  may  use  an 
exchangeable  hierarchical  model.  Foi  example,  if  Y-  is  the  number  of  occurrences  over  an 

exposure  time  of  length  t-,  i=l,...,k,  we  might  assume  that  each  Y-  is  a  realization  from 

an  independent  Poisson  process  having  constant  rate  A-,  i.e.,  Y.^?^ 

assume  that  the  A.  are  independent  and  identically  distributed  from  some  second  stage 

distribution  t.  The  conjugate  choice  for  t  would  be  a  gamma  distribution,  so  that  the 
complete  conditional  distributions  for  the  Aj  are  updated  gammas  (see  Gelfand  and 

Smith,  1990).  However,  in  order  to  allow  for  more  dispersion  and  possible  outliers  in  the 
rates,  we  might  prefer  a  lognormal  or  logstudent— t  prior  for  the  A^,  neither  admitting 

closed  form  complete  conditionals  for  the  A-. 

To  develop  the  competing  models  more  explicitly  in  the  gamma  case,  we  have  at  the 
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second  stage  A.^-^  gamma(a,/?),  where  for  convenience  a  is  a  known  tuning 

constant.  At  the  third  stage  of  the  hierarchy,  we  suppose  /?  -  IG(c,d),  where  IG  denotes 
the  inverse  (reciprocal)  gamma  distribution  having  mode  d/(c— 1),  and  c  and  d  are 
known  constants.  In  the  log-t  case,  letting  (■  =  log(Aj)  we  have  where  0 

and  £7  are  unknown  location  and  scale  parameters,  respectively,  and  u>  is  specified  (note 

that  this  parameter  has  nothing  to  do  with  i/,  the  degrees  of  freedom  for  our  envelope 

split— t  distribution).  At  the  third  stage  of  this  model,  we  suppose  0  -  N(^,r  )  and  cr  - 
2  2 

IG(a,b),  0  and  cr  independent,  /r,  r  ,  a  and  b  known.  Taking  u  sufficiently  large  leads 
to  the  lognormal  model  for  Aj  mentioned  above. 

Implementation  of  the  Gibbs  sampler  is  routine  in  the  gamma  case  (see  Gelfand  and 
Smith  1990  for  details).  In  the  log-t  case  none  of  the  (k+2)  necessary  complete 
conditionals  are  standard  distributions  and  hence,  we  apply  the  methods  of  Section  3. 
Since  the  likelihood  factors  into  k  pieces  each  involving  only  Aj,  0  and  a  ,  the  remark  at 

the  end  of  Section  3  may  be  used  to  reduce  the  dimensionality  of  each  of  the  first  k 
component  problems.  However,  to  streamline  the  computer  code  we  chose  to  ignore  these 
savings,  simply  using  the  same  (k+2)-<iimensional  f  function  for  each  parameter  under 
the  parametrization  Uj  =  =  logAj,  i=l,...,k,  and  Here  the 

covariance  matrix  is  approximated  using  a  derivative— free  numerical  Hessian. 

The  data  in  Table  2  are  taken  from  Worledge,  Stringham,  and  McClymont  (1982), 
and  record  the  number  of  failures  of  pumps  over  given  lpr«^t.hc  of  time  for  several  systems  of 
a  certain  nuclear  power  plant.  Gaver  and  0’Muirrhe?.rwjgh  (1987)  also  fit  both  the 
gamma  and  log-t  models  described  above  to  this  data,  but  employ  an  empirical  Bayes 
approach,  using  the  data  to  estimate  all  the  parameters  at  the  second  stage  of  the  model 
instead  of  placing  third  stage  prior  distributions  on  them.  We  make  our  analysis  somewhat 
comparable  in  the  case  of  the  gamma  by  choosing  a  =  a  =  1.802,  the  value  of  the  method 
of  moments  estimator  of  a  based  on  the  marginal  distribution  of  the  data  m(Y|a,/?),  and 

taking  c  =  2.01  and  d  =  1.01,  so  that  0  has  prior  mean  1  and  prior  standard  deviation 

2 

10.  In  the  log-t  case,  we  specify  the  priors  on  0  by  letting  /x  =  — 1,  r  =1,  and  the  prior 

2 

on  o  by  letting  a  =  2.01  and  b  =  1.01  (again,  a  rather  vague  prior  with  mean  1  and 
standard  deviation  10).  We  use  split-t  distributions  in  our  rejection  algorithm,  taking  u 
=  10. 


[INSERT  TABLE  2  ABOUT  HERE] 
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The  estimated  posterior  distributions  for  fp  and  under  the  gamma  model, 

the  log— t  model  with  u  =  o,  and  the  log— t  model  with  u  =  50  (essentially  a  lognormal 
nodel)  are  shown  in  Figures  2(a)  -  (c)  using  (1)  after  t  =  30  iterations  with  m  =  100 
replications.  The  results  are  similar  to  those  of  Gaver  and  O’Muircheartaigh  (1987,  p.  11). 
We  see  that,  as  expected,  the  gamma  model  generally  produces  posterior  distributions  that 
are  more  highly  peaked  and  less  dispersed  Also  note  that  the  gamma  model  seems  to 
encourage  more  shrinkage  to  the  grand  mean  of  the  observed  rates.  This  is  especially  true 
for  Eg,  a  rate  corresponding  to  a  system  having  a  shorter  history  (smaller  tj). 

4.3  Generalized  Logistic  Regression 

As  a  final  illustration  we  consider  a  class  of  models  that  find  broad  application  in 
the  social  and  biological  sciences,  especially  in  the  context  of  dose— response  studies. 
Suppose  we  have  a  Bernoulli  response  variable  Z  and  a  continuous  predictor  variable  W. 
Typically  one  models  the  probability  of  a  response  at  a  given  level  of  the  predictor  as 

P(w)  =  h(s)ds  (10) 

-TD 

where  y  =  (w-/i)/a,  ^  and  a  unknown.  The  most  common  assumption  is  to  let  h  be 
the  logistic  distribution,  which  enables  the  closed  form  expression 

P(w)  =  exp  y/(l+exp  y),  (11) 

i.e.  the  familiar  logistic  regression  model.  Prentice  (1976)  extended  (11)  introducing  the 
class  of  models  generated  by  taking 

-(m^+m^) 

h(s)  =  exp(smj)(l  +  exp(s))  nipm2  >  0,  (12) 

where  Q{- ,-)  represents  the  beta  function.  Prentice  remarks  that  with  appropriate  choice 
of  mpm2  other  familiar  models  for  binary  response  date  emerge.  More  importantly  he 

notes  the  potential  improvement  in  fit  afforded  by  the  additional  parameters.  One 

convenient  special  case  that  enables  an  explicit  form  for  P(w)  is  to  set  m2  =  1,  obtaining 
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P(w)  =  [exp  y/(l+exp  y)]  (13) 

To  effect  a  Bayesian  analysis  using  model  (13)  we  need  to  specify  priors  on  /x,  a 
and  mj.  Broad  modeling  possibilities  arise  by  letting  m^  -  Gamma(aQ,b^),  /x  -  N(c^,dQ), 
2  2 

and  <7  -  IG(eQ,fQ),  m^,  /x,  and  £t  a  priori  independent  and  a^,  b^,  c^,  d^,  e^  and 
known.  If  we  observe  X.  responses  out  of  n^  observations  at  predictor  level  w^,  i=l,...,k, 
f  takes  the  form 


f(M,  nij)  « 


{  n  [P(wj)]  ^  [l-P(wj)] 
i=l  *  ’ 


with  P(w)  as  given  in  (13).  Again,  the  three  complete  conditional  distributions  will  be 
sampled  using  the  rejection  method. 

Our  illustrative  data  set  for  this  model  is  taken  ffom  Bliss  (1935),  and  gives  the 
proportion  of  adult  flour  beetles  killed  after  five  hours  exposure  to  various  levels  of  gaseous 
carbon  disulphide.  These  data,  displayed  in  Table  3,  have  been  much— analyzed  in  the 
literature  since  their  variability  cannot  be  adequately  explained  by  the  standard  logit 


mo'^''!  fll'  We  will  compare  the  posterior  distribution  of  /x  (=LD,q  in  the  dose-respoas. 


context)  under  the  "full"  model  (13)  and  the  "reduced"  model  (11). 


[INSERT  TABLE  3  ABOUT  HERE] 


For  prior  specification,  we  let  a^  =  .25  and  b^  =  4,  so  that  m^  has  prior  mean  1 

(the  "reduced"  value)  and  prior  variance  4.  We  take  rather  vague  priors  on  li  and  a  by 
letting  c^  =  2,  d^  =  10,  e^  =  2.000004,  and  f^  =  .001  (so  that  has  prior  mean  .001 

and  prior  standard  deviation  .5).  Using  the  obvious  parametrization  Uj  =  /x,  U2  =  loga, 

and  Ug  =  log  mp  empirical  evidence  suggests  a  multivariate  split-t  distribution  with  j/  = 

3  to  insure  adequate  domination  of  f  in  the  tails.  The  covariance  matrix  is  approximated 
using  the  quadratic  regression  approach. 

Figure  3(a)  shows  the  posterior  distribution  of  LD^q  under  the  full  model  (solid 
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line)  and  the  reduced  model  (dashed  line)  arising  from  t  =  50  iterations  with  m  =  500 
replications.  The  full  model  posterior  mode  of  1.815  is  very  close  to  the  MLE  value  n  = 
1.818  reported  by  Prentice  (1976).  The  small  overlap  between  the  two  posteriors  in  Figure 
3(a)  is  consistent  with  the  alleged  lack  of  fit  of  the  reduced  model.  Figure  3(b)  shows 
similar  problems  in  estimating  U2  =  log(«^);  its  posterior  distribution  under  the  reduced 

model  is  also  inappropriately  centered  and  too  highly  concentrated.  Still  further  evidence 
of  the  inadequacy  of  the  reduced  model  is  provided  by  the  marginal  posterior  of  Ug  = 

log(mj)  in  Figure  3(c).  Here  we  see  the  value  assumed  under  the  reduced  model,  =  0, 

is  located  in  the  extreme  right  hand  tail  of  the  estimated  posterior. 

■5  SnTnmary  and  (Inmmflnts 

To  carry  out  calculations  needed  for  Bayesian  inference  the  Gibbs  sampler  is 
attractive  in  that,  by  utilizing  complete  conditional  distributions,  multivariate  concerns 
become  univariate  ones.  Moreover  previous  work  (Gelfand  and  Smith,  1990;  Gelfand  et. 
al.,  1989;  Carlin  et.  al.,  1989;  Gelfand  et.  al.,  1990)  shows  this  approach  to  be  a  reasonably 
straightforward  means  for  implementing  fully  Bayesian  inference  under  conjugacy.  This 
paper  demonstrates  that  the  Gibbs  sampler  can  handle  nonconjugate  cases  as  well. 

Nonetheless,  problems  which  plague  other  techniques  for  Bayesian  calculation  (such 
techniques  are  discussed  in  Naylor  and  Smith,  1982,  1988;  Smith  et  al,  1985,  1987;  Tierney 
and  Kadane,  1986;  Geweke,  1989)  will  also  cause  diffir.,i;ie?  'r.-c  the  Gibbs  sampling 
approach.  Such  problems  include  disagreement  between  likelihood  and  prior, 
parametrization  and  flatness  of  the  likelihood,  strong  posterior  dependence  amongst  the 
parameters  and  of  course  high  dimensionality.  In  such  situations,  successful  use  of  the 
Gibbs  sampler  will  require  "tweaking".  The  severity  of  the  above  problems  will  dictate  the 
extent  of  fine  tuning  required.  Hopefully  the  conceptual  simplicity  of  this  iterative, 
univariate  approach  will  simplify  such  matters. 

We  anticipate  (and  our  examples  support  this)  that  the  approach  of  Section  3  will 
be  most  effective  in  situations  involving  complicated  likelihood  but  relatively  low 
dimension  or  in  higher  dimensional  situations  as  described  at  the  end  of  Section  3  where  for 
any  parameter  the  number  of  other  model  parameters  entering  into  its  complete 
conditional  distribution  will  be  somewhat  small. 

Lastly  we  note  that  use  of  this  rejection  method  by  means  of  a  distinct  but 
convenient  envelope  for  each  iteration  and  each  replication  is  described  in  Racine-Poon  et. 
al.,  (1990).  Extension  of  this  approach  certainly  merits  further  investigation  as  does 
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comparison  through  challenging  applications  with  the  approach  of  this  paper. 
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Tahlo  1 


Length  (Y)  Versus  Age  (X)  for  the  Sirenian  Species  Dugong 


X 

1 

1.5 

1.5 

1.5 

2.5 

4.0 

5.0 

5.0 

7.0 

Y 

1.80 

1.85 

1.87 

1.77 

2.02 

2.27 

2.15 

2.26 

2.35 

X 

8.0 

8.5 

9.0 

9.5 

9.5 

10.0 

12.0 

12.0 

13.0 

Y 

2.47 

2.19 

2.26 

2.40 

2.39 

2.41 

2.50 

2.32 

2.43 

X 

13.0 

14.5 

15.5 

15.5 

16.5 

17.0 

22.5 

29.0 

31.5 

Y 

2.47 

2.56 

2.65 

2.47 

2.64 

2.56 

2.70 

2.72 

2.57 

Table  2.  Pump  Failures  (tj  in  thousands  of  hours) 


System  j 

X. 

J 

1 

5 

94.320 

2 

1 

15.720 

3 

5 

62.880 

4 

14 

125.760 

5 

3 

5.240 

6 

19 

31.440 

7 

1 

1.048 

8 

1 

1.048 

9 

4 

2.096 

10 

22 

10.480 

Table  3.  Observed  Flour  Beetle  Mortality  Dat? 

Dosage 

No.  of  Beetles 

CS2 

Killed  Exposed 

1.6907 

6 

59 

1.7242 

13 

60 

1.7552 

18 

62 

1.7842 

28 

56 

1.8113 

52 

63 

1.8369 

53 

59 

1.8610 

61 

62 

1.8839 

60 

60 

Figure  1 .  Estimated  posteriors,  dugong  data 
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Figure  2.  Estimated  posteriors,  pump  failure  data 
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Figure  3.  Estimated  posteriors,  beetle  data 
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ABSTRACT 


The  Gibbs  sampler  has  been  proposed  as  a  general  method  for  Bayesian  calculation 
in  Gelfand  and  Smith  (1990).  However  experience  to  date  is  almost  exclusively  in 
applications  assuming  conjugacy  where  implementation  is  reasonably  straightforward. 
This  paper  describes  a  tailored  rejection  method  approach  for  implementation  of  the  Gibbs 
sampler  when  nonconjugate  structure  is  present.  Several  challenging  applications  are 
presented  for  illustration. 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  THIS  RAGE  CHUM  Osta  CMIataai 


REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

REPORT  NUMaCR 

457 

2.  GOVT  accession  NO. 

2.  RECIPIENT’S  catalog  NUMBER 

4.  1  IT i.t.  {and  SuOilila) 

AN  ITERATIVE  MONTE  CARLO  METHOD  FOR 

NONCONJUGATE  BAYESIAN  ANALYSIS 

S.  TYPE  OF  REPORT  A  PERIOD  COVERED 

Technical 

S.  PERFORMING  ORG.  REPORT  NUMBER 

7.  AUTHORfa; 

Bradley  P.  Carlin  and  Alan  E.  Gelfand 

S.  contract  or  grant  NUMBERfaJ 

N0025-92-J-1264 

s.  performing  organization  name  and  address 

Department  of  Statistics 

Stanford  University 

Stanford,  CA  94305-4065 

10.  PROGRAM  element,  project.  TASK 
AREA  A  WORK  UNIT  NUMBERS 

NR-042-267 

1 1.  controlling  office  name  and  address 

Office  of  Naval  Research 

StatiSwiCb  &  Probability  Program 

Cod.: 

12.  REPORT  DATE 

September  7,  1992 

13-  NUMBER  OF  PAGES  j 

14.  monitoring  agency  name  a  AOORESSCir  dlllatani  Inm  Canimllint  Ollica) 

IS.  SECURITY  CLASS,  (ol  lltia  rapartj 

Unclassified 

ISa.  DECLASSIFICATION/ downgrading 
SCHEDULE 

IS.  OISTRiaUTION  STATEMENT  (ol  thtt  Rmpott) 


Approved  for  public  release;  distribution  unlimited. 


17.  DISTRIBUTION  STATEMENT  {ol  Olo  mOoIrmet  aniarad  In  Block  70.  II  dlllmtcnl  Irom  Report) 


IS.  supplementary  notes 


IS.  (CEY  words  fConilnua  on  roooroo  oldo  11  noeoooarr  and  Mantl/F  Sr  Sloe*  numOor) 

i 

Keywords:  Bayesian  inference.  Gibbs  sampier.  hierarchical  models,  logistic  regression, 
nonlinear  models,  rejection  method 

20.  abstract  (Conlintto  an  toooroo  aide  II  nacaaaair  and  Idanlltr  Or  Slac*  tnmOar) 

See  Reverse  Side 


DD 


FORM 
«  JAN  7} 


1473  edition  of  I  NOV  SB  It  OBSOLKTK 

S/N  0I0}*0I4*6B0I  ' 


UNCLASSIFIED 


(BCURITT  classification  of  this  pace  fWhan  Data  Bntarad) 


